Distributed Speculations: Providing Fault-tolerance and Improving Performance

نویسنده

  • Cristian Ţăpuş
چکیده

This thesis introduces a new programming model based on speculative execution and it examines the use of speculations, a form of distributed transactions, for improving the performance, reliability and fault tolerance of distributed systems. A speculation is defined as a computation that is based on an assumption that is not validated before the computation is started. If the assumption is later invalidated the computation is aborted and the state of the program is rolled back; if the assumption is validated, the results of the computation are committed. The primary difference between a speculation and a transaction is that a speculation is not isolated—for example, a speculative computation may send and receive messages, and it may modify shared objects. As a result, processes that share those objects may be absorbed into a speculation. The contributions presented in this thesis include: • the introduction of a new programming model based on speculations, • the definition of new speculative programming language constructs, • the formal specification of the semantics of various speculative execution models, including message passing and shared objects, • the implementation of speculations in the Linux kernel in a transparent manner, and • the design and implementation of components of a distributed filesystem that supports speculations and guarantees sequential consistency of concurrent accesses to files.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Speculations: Providing Fault-tolerance and Recoverability in Distributed Environments

Building safe and reliable programs is an important but difficult endeavor. The challenge is even greater in the context of distributed environments, which may involve complex synchronization operations in the presence of process and network failures. Transactions are one of the earliest and simplest abstractions for reliable concurrent programming [2]. They provide fault-isolation by guarantee...

متن کامل

Improving the palbimm scheduling algorithm for fault tolerance in cloud computing

Cloud computing is the latest technology that involves distributed computation over the Internet. It meets the needs of users through sharing resources and using virtual technology. The workflow user applications refer to a set of tasks to be processed within the cloud environment. Scheduling algorithms have a lot to do with the efficiency of cloud computing environments through selection of su...

متن کامل

A Theory of Nested Speculative Execution

Implementing distributed applications is a challenging task. Developers of such systems are confronted with issues like fault-tolerance, efficient synchronization mechanisms, and the correctness of the distributed code. This paper introduces a new programming model based on speculative execution that addresses these issues. Speculations provide distributed atomic rollback and enable optimistic ...

متن کامل

The performance of independent checkpointing in distributed systems

This paper describes performance measurements of an implementation of independent checkpointing in a network of workstations. Independent checkpointing is a simple technique for providing fault tolerance in distributed system, Because processes do not coordinate during checkpointing, this technique has a low run-time overhead. To avoid the classical domino effect, our implementation relies on a...

متن کامل

Improving Performance in Adaptive Fault Tolerance Structure with investigating the effect of the number of replication

Regarding the wide use of distributed systems in various areas, having a system with fault tolerance ability would be an import characteristic. And in designing the real time distributed systems, this seems to be more considerable. With regard using some middleware like CORBA in designing such systems, and in order to increase their compatibility, speed, performance, to simplify the network pro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006